Estimation System

This chapter discusses the nature of the estimation system. Topics include:

Framework for handling different data consistently

CUBE Analyst provides a framework that is used to input a variety of information to estimate an O-D matrix. The characteristics of the system are that:

Some or all of the types of information introduced in Common elements and variations may be used.
The system can work with little data, but the accuracy of the estimated matrix is improved as more data is input.
Different information is handled on a consistent basis.
The variability of data is explicitly accounted for.

Objectives

The aim of CUBE Analyst is to maximize the value of existing data and to limit the need for costly surveys. As such, it is mainly concerned with processing information in the best (statistical) manner; though the accuracy of the estimated matrix remains strongly affected by the amount and the quality of the information input by the user.

Beside the role of estimating matrices for individual studies, CUBE Analyst is suited for use with regular surveys designed to keep matrix information up-to-date.

Handling data variability

CUBE Analyst explicitly considers the variability of data. Inevitably, there are inconsistencies in what the different data suggest that the estimated matrix should be. The inherent variability means that collected data items are merely a sample, and hence the values, (even of simple traffic counts) may only be considered to fall within a range (a distribution). The width of this range is a reflection of the confidence that may be placed in particular items.

CUBE Analyst therefore requires the user to input information about how confident they are that each data item is representative of the situation for which the matrix is to be estimated. The information is input as a nominal percentage sample value. In restricted circumstances, this may be an actual sample obtained in a survey. This information about the variability is used to determine what relative influence each item of data has in the estimation process—it acts algebraically as a weighting value, and is referred to as a confidence level.

Options for users

The user does not have to use CUBE Analyst in one manner, but rather according to the information that is available and the context within which the matrix is required. Typically, the user will start with what information is to hand or may easily be collected. This provides a fast means of obtaining an initial matrix that can enable a study to proceed, at least for general investigations. Analysis of the resulting matrix and estimation statistics will show where there is greatest requirement for further quality data. CUBE Analyst is then used to integrate this new (and possibly different type of) data to produce an improved estimated matrix.

Considerations for users

CUBE Analyst involves the user in a number of stages:

Deciding what information to input

This will usually be all information already available, but new data will normally be appropriate for those parts of the study area where most change has taken place since previous surveys, or where traffic schemes or policy proposals require detailed analysis.

Identify notable features and data sources

Feature Changes in:	Example	Data
Car ownership	Traffic growth	Counts
Land use	New industry, shops New car parking	Trip ends (generations and attractions)
Road/public transport network	New bypass Traffic management New bus/ rail services	Travel times, routing
Travel habits	Out-of-town shopping	Observed O-D patterns; PT operators' boarding & alighting surveys; vehicle licence plate surveys

Inputting data

Information may be input in the form of matrices, as trip ends, or as network-related information. This data is prepared by the user within CUBE, which offers a variety of modes of data entry. Extra information is required on data variability. This is input in the same form as the information to which it corresponds. Each data item, for example each count, trip end, etc., may have an individual confidence level attached to it, but in many cases global values will be used.

Estimating the matrix

The matrix estimation stage simply requires the user to input the prepared files into CUBE Analyst. As is described in Overview of CUBE Analyst, and with more detail in Chapter 4, Mathematical Background CUBE Analyst performs a set of iterative calculations which will automatically determine the statistically most likely matrix for the set of input data values provided.

The first time CUBE Analyst is run, it creates a set of files which can be used to reduce the run times of subsequent runs of CUBE Analyst. This is either because the need to restructure data is avoided (the intercept file) or because an estimation can take advantage of previously calculated results (the gradient search file and the model parameter file).

This ability to benefit from a previous run of CUBE Analyst (for the same basic study) is usually used to assist in analyzing the consequences of changes in data values, but, for lengthy runs for large matrices it can provide a means of breaking an estimation into more than one run, for convenience.

With an improved optimizer in CUBE Analyst and more powerful computers such staging of estimations is now rarer, but it remains a typical feature for hierarchic estimations of extremely large matrices. This is assisted by the local matrix control file, which is open to editing so that estimations are staged in a manner convenient to the user.

Analyzing the estimated matrix

It is natural and desirable to want to check the quality of the estimated matrix. A typical approach to checking quality might be to compare the estimated matrix with some observed data which has not been used in the estimation process. However, this approach is not usually appropriate for CUBE Analyst, which is designed to take advantage of all reasonably observed data. For example, if the estimated matrix implies that the link flows across a screenline are different from that observed (this is easily checked by assigning the estimated matrix to the network), then the solution is to re-run the estimation but now incorporating the extra observed data.

The approach to analyzing the quality of the estimated matrix is, therefore, based on:

Comparing the estimated results with input data values
Checking the sensitivity of the results if data values are altered
Analyzing the estimation calculations

Besides information output by CUBE Analyst itself, extensive use is made of other Bentley programs for creating tabulations and graphic displays which highlight different characteristics of the estimated matrix.

Improving the estimated matrix

Deficiencies in the quality of the estimated matrix, when they are signalled by the results of the analysis phase, are remedied by improving the quality or quantity, or both, of the input data. The analysis phase can provide strong pointers as to which data is contributing to quality problems and hence where the user can focus attention.

Estimating highway and public transport matrices

For much of the time, it is not necessary to distinguish between the cases of estimating matrices for use with highways and public transport analysis; the same principles apply to each. However, there are a number of points to note. The first one is that the units of the matrices are usually in terms of vehicles for highways, and in terms of passengers for public transport.

Much of the data and methods of processing are identical for both highways and public transport, but the routing information is derived in quite different ways. There is also the concept of line groups, which only applies to public transport and not to highways.

Assumptions about the quality and quantity of data vary between the modes. Link count data is more readily, and accurately, available for highways than for public transport. Public transport is often more reliant on part-trip data, as obtained from boarding and alighting surveys. This form of data may be obtained from licence plate matching surveys for highways.

Overview of CUBE Analyst

CUBE Analyst’s operations can be considered as a series of activities:

1. Data input and restructuring

For the most part, CUBE Analyst simply reads the set of user’s input data at this initial stage. However CUBE Analyst also analyzes and restructures routing information (from the TRIPS route choice probability (RCP) file or CUBE Voyager path file), and count data, from the screenline file, into a more concise and efficient file, called the intercept file. This restructuring can be relatively lengthy so, as noted in Considerations for users, it is possible to re-use an Intercept file once it has been created. For CUBE Voyager users, the creation of the Intercept file is handled by the HIGHWAY program.

2. Calculation initiation

The main CUBE Analyst calculations may be viewed as a search for the statistically most likely matrix, given the set of input data values. As this search relates, typically, to many thousands of matrix cell values, the manner of searching is a critical aspect of CUBE Analyst.

A calculation called the method of scoring directs the start of the searching process. This calculation is always done as the first stage of the estimation calculation, and it may be repeated later, according to the settings of CUBE Analyst’s ITERH parameter. (This determines the number of iterations between gradient search matrix calculations.)

There is a strategy consideration here. The default method for running CUBE Analyst spends time with the method of scoring calculation in order to limit subsequent calculations. CUBE Analyst also calculates a suitable value for ITERH. However, it is open to the user to over-ride this strategy by:

Changing the setting of the IHTYPE parameter (used to determine the optimization process) of CUBE Analyst from its default in order to avoid the method of scoring. This reduces the associated calculation time, but means that the searching process is initially less well directed and so the net calculation time may still be longer.
Setting ITERH to a lower value than the default, which means that the searching process is re-appraised by further application of the method of scoring. This may be suitable when there are signs that the optimizer is not able to determine a convergent solution in a reasonable number of iterations.

The user should note that these options for tuning the performance of CUBE Analyst exist, but should not necessarily be concerned to apply them, as the default operation is usually entirely satisfactory. It requires some experience with a particular estimation problem to determine its best strategy.

3.Function evaluation

Function evaluation is the term used to describe the calculation of a series of estimation results. These are calculated by way of an estimation equation (function). The estimation equation calculates the values of the estimated cells according to the current values of a series of model parameters. There are a large number of model parameters, in fact the number is usually two times the number of zones, plus the number of screenlines.

These model parameters have an initial value of 1.0, which has the consequence that the initial function evaluation (usually) results in an estimated matrix which is identical to the old (Prior—see Prior trip matrix).

4.Optimization

The optimizer is a central feature of CUBE Analyst; there are two critical elements to it:

a. Objective function — This provides a criterion by which the optimizer can determine whether one value of a particular cell is better than another value. Maximum likelihood objective function explains how this criterion is derived from the statistical maximum likelihood theory and rigorous mathematical calculation. Hence, CUBE Analyst defines better as statistically more likely.

b. Set of search directions and a step length — The optimizer alters the model parameter values, from their starting point of 1.0, to seek an estimated matrix that is an improvement on its current estimates. The search direction determines, for any cell in the matrix, whether model parameters should be increased or decreased, and the step length defines by how much.

The final values of the model parameters are available to view as the model parameter file, so it is possible to see how they have been changed from 1.0.

5.Iterations and convergence

After the optimizer has calculated new model parameter values, the function evaluation process is repeated to obtain the latest estimated matrix (and its derivative values). This overall process is repeated in a series of iterations; at each iteration the optimizer will ensure that the new estimated matrix is an improvement (more likely) than the previous one. Because there are so many cells to estimate, which CUBE Analyst does not confine to have integer values, it is normally always possible to make some improvement, however small. Therefore, it is necessary to define a criterion to determine when the iterations have reached an acceptable solution. In CUBE Analyst, this criterion is set by the UTOL (user tolerance) control parameter. UTOL sets a minimum value on the step length which the optimizer is allowed to use, as very small step lengths indicate that the optimizer is making correspondingly small changes to the estimated matrix. It is usual to leave UTOL at its default value, and allow CUBE Analyst to run until it terminates with a converged message.